Slope heuristics for variable selection and clustering via Gaussian mixtures
نویسندگان
چکیده
Specific Gaussian mixtures are considered to solve simultaneously variable selection and clustering problems. A penalized likelihood criterion is proposed in Maugis and Michel (2008) to choose the number of mixture components and the relevant variable subset. This criterion is depending on unknown constants to be approximated in practical situations. A “slope heuristics” method is proposed and experimented to deal with this practical problem in this context. Numerical experiments on simulated datasets, a curve clustering example and a genomics application highlight the interest of the proposed heuristics. Key-words: Model-based clustering, Variable selection, Penalized likelihood criterion, Slope heuristics, Curve clustering. ∗ INRIA Futurs, Projet select, Université Paris-Sud 11 † INRIA Futurs, Projet select, Université Paris-Sud 11 in ria -0 02 84 62 0, v er si on 2 4 Ju n 20 08 Heuristique de pente pour la sélection de variables et la classification non supervisée via des mélanges gaussiens Résumé : Des mélanges gaussiens de formes spécifiques sont considérés pour résoudre un problème de sélection de variables en classification non supervisée. Un critère de vraisemblance pénalisée est proposé dans Maugis and Michel (2008) pour sélectionner le nombre de composantes du mélange et le sous-ensemble des variables significatives pour la classification. Ce critère dépend de constantes multiplicatives inconnues qui doivent être évaluées en pratique. Une méthode heuristique dite “de la pente” est proposée et expérimentée pour résoudre ce problème. Des exemples numériques sur données simulées, un exemple de classification de courbe et une application génomique mettent en évidence l’intérêt de cette procédure. Mots-clés : Classification, Mélanges gaussiens, Sélection de variables, Critère pénalisé, Heuristique de pente, Classification de courbes. in ria -0 02 84 62 0, v er si on 2 4 Ju n 20 08 Slope heuristics for Gaussian mixtures model selection 3
منابع مشابه
Ranking Pharmaceutics Industry Using SD-Heuristics Approach
In recent years stock exchange has become one of the most attractive and growing businesses in respect of investment and profitability. But applying a scientific approach in this field is really troublesome because of variety and complexity of decision making factors in the field. This paper tries to deliver a new solution for portfolio selection based on multi criteria decision making literatu...
متن کاملBYY harmony learning, structural RPCL, and topological self-organizing on mixture models
The Bayesian Ying-Yang (BYY) harmony learning acts as a general statistical learning framework, featured by not only new regularization techniques for parameter learning but also a new mechanism that implements model selection either automatically during parameter learning or via a new class of model selection criteria used after parameter learning. In this paper, further advances on BYY harmon...
متن کاملA non asymptotic penalized criterion for Gaussian mixture model selection
Specific Gaussian mixtures are considered to solve simultaneously variable selection and clustering problems. A non asymptotic penalized criterion is proposed to choose the number of mixture components and the relevant variable subset. Because of the non linearity of the associated Kullback-Leibler contrast on Gaussian mixtures, a general model selection theorem for MLE proposed by Massart (200...
متن کاملMesh Segmentation Using Laplacian Eigenvectors and Gaussian Mixtures
In this paper a new completely unsupervised mesh segmentation algorithm is proposed, which is based on the PCA interpretation of the Laplacian eigenvectors of the mesh and on parametric clustering using Gaussian mixtures. We analyse the geometric properties of these vectors and we devise a practical method that combines single-vector analysis with multiple-vector analysis. We attempt to charact...
متن کاملBayesian Clustering with Variable and Transformation Selections
The clustering problem has attracted much attention from both statisticians and computer scientists in the past fifty years. Methods such as hierarchical clustering and the K-means method are convenient and competitive first choices off the shelf for the scientist. Gaussian mixture modeling is another popular but computationally expensive clustering strategy, especially when the data is of high...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008